Corpus: guj_wikipedia_2021_100K

Other corpora

4.3.1.5 Number of Word-N-grams at Sentence Beginnings

Number of word-N-grams for N=1...5 for the first K sentences


Zipf's diagram for sentence beginnings


Gnuplot diagram

K # of words # of bigrams # of trigrams # of 4-grams # of 5-grams
100 48 93 99 99 99
1000 471 904 987 994 998
10000 2766 8126 9701 9927 9976
100000 30305 80803 96871 99055 99600
1000000 30306 80804 96872 99056 99601
8004 msec needed at 2021-07-10 23:12